123 research outputs found
An automated classification approach to ranking photospheric proxies of magnetic energy build-up
We study the photospheric magnetic field of ~2000 active regions in solar
cycle 23 to search for parameters indicative of energy build-up and subsequent
release as a solar flare. We extract three sets of parameters: snapshots in
space and time- total flux, magnetic gradients, and neutral lines; evolution in
time- flux evolution; structures at multiple size scales- wavelet analysis.
This combines pattern recognition and classification techniques via a relevance
vector machine to determine whether a region will flare. We consider
classification performance using all 38 extracted features and several feature
subsets. Classification performance is quantified using both the true positive
rate and the true negative rate. Additionally, we compute the true skill score
which provides an equal weighting to true positive rate and true negative rate
and the Heidke skill score to allow comparison to other flare forecasting work.
We obtain a true skill score of ~0.5 for any predictive time window in the
range 2-24hr, with a TPR of ~0.8 and a TNR of ~0.7. These values do not appear
to depend on the time window, although the Heidke skill score (<0.5) does.
Features relating to snapshots of the distribution of magnetic gradients show
the best predictive ability over all predictive time windows. Other
gradient-related features and the instantaneous power at various wavelet scales
also feature in the top five ranked features in predictive power. While the
photospheric magnetic field governs the coronal non-potentiality (and
likelihood of flaring), photospheric magnetic field alone is not sufficient to
determine this uniquely. Furthermore we are only measuring proxies of the
magnetic energy build up. We still lack observational details on why energy is
released at any particular point in time. We may have discovered the natural
limit of the accuracy of flare predictions from these large scale studies
Learning with Biased Complementary Labels
In this paper, we study the classification problem in which we have access to
easily obtainable surrogate for true labels, namely complementary labels, which
specify classes that observations do \textbf{not} belong to. Let and
be the true and complementary labels, respectively. We first model
the annotation of complementary labels via transition probabilities
, where is the number of
classes. Previous methods implicitly assume that , are identical, which is not true in practice because humans are
biased toward their own experience. For example, as shown in Figure 1, if an
annotator is more familiar with monkeys than prairie dogs when providing
complementary labels for meerkats, she is more likely to employ "monkey" as a
complementary label. We therefore reason that the transition probabilities will
be different. In this paper, we propose a framework that contributes three main
innovations to learning with \textbf{biased} complementary labels: (1) It
estimates transition probabilities with no bias. (2) It provides a general
method to modify traditional loss functions and extends standard deep neural
network classifiers to learn with biased complementary labels. (3) It
theoretically ensures that the classifier learned with complementary labels
converges to the optimal one learned with true labels. Comprehensive
experiments on several benchmark datasets validate the superiority of our
method to current state-of-the-art methods.Comment: ECCV 2018 Ora
Retrospective suspect and non-target screening combined with similarity measures to prioritize MDMA and amphetamine synthesis markers in wastewater
3,4-Methylenedioxymethamphetamine (MDMA) and amphetamine are commonly used psychoactive stimulants. Illegal manufacture of these substances, mainly located in the Netherlands and Belgium, generates large amounts of chemical waste which is disposed in the environment or released in sewer systems. Retrospective analysis of high-resolution mass spectrometry (HRMS) data was implemented to detect synthesis markers of MDMA and amphetamine production in wastewater samples. Specifically, suspect and non-target screening, combined with a prioritization approach based on similarity measures between detected features and mass loads of MDMA and amphetamine was implemented. Two hundred and thirty-five 24 h-composite wastewater samples collected from a treatment plant in the Netherlands between 2016 and 2018 were analyzed by liquid chromatography coupled to high-resolution mass spectrometry. Samples were initially separated into two groups (i.e., baseline consumption versus dumping) based on daily loads of MDMA and amphetamine. Significance testing and fold-changes were used to find differences between features in the two groups. Then, associations between peak areas of all features and MDMA or amphetamine loads were investigated across the whole time series using various measures (Euclidian distance, Pearson's correlation coefficient, Spearman's rank correlation coefficient, distance correlation and maximum information coefficient). This unsupervised and unbiased approach was used for prioritization of features and allowed the selection of 28 presumed markers of production of MDMA and amphetamine. These markers could potentially be used to detect dumps in sewer systems, help in determining the synthesis route and track down the waste in the environment
Structured Random Matrices
Random matrix theory is a well-developed area of probability theory that has
numerous connections with other areas of mathematics and its applications. Much
of the literature in this area is concerned with matrices that possess many
exact or approximate symmetries, such as matrices with i.i.d. entries, for
which precise analytic results and limit theorems are available. Much less well
understood are matrices that are endowed with an arbitrary structure, such as
sparse Wigner matrices or matrices whose entries possess a given variance
pattern. The challenge in investigating such structured random matrices is to
understand how the given structure of the matrix is reflected in its spectral
properties. This chapter reviews a number of recent results, methods, and open
problems in this direction, with a particular emphasis on sharp spectral norm
inequalities for Gaussian random matrices.Comment: 46 pages; to appear in IMA Volume "Discrete Structures: Analysis and
Applications" (Springer
PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers
The aim of this paper is to generalize the PAC-Bayesian theorems proved by
Catoni in the classification setting to more general problems of statistical
inference. We show how to control the deviations of the risk of randomized
estimators. A particular attention is paid to randomized estimators drawn in a
small neighborhood of classical estimators, whose study leads to control the
risk of the latter. These results allow to bound the risk of very general
estimation procedures, as well as to perform model selection
Maximum-Reward Motion in a Stochastic Environment: The Nonequilibrium Statistical Mechanics Perspective
We consider the problem of computing the maximum-reward motion in a reward field in an online setting. We assume that the robot has a limited perception range, and it discovers the reward field on the fly. We analyze the performance of a simple, practical lattice-based algorithm with respect to the perception range. Our main result is that, with very little perception range, the robot can collect as much reward as if it could see the whole reward field, under certain assumptions. Along the way, we establish novel connections between this class of problems and certain fundamental problems of nonequilibrium statistical mechanics . We demonstrate our results in simulation examples
A population Monte Carlo scheme with transformed weights and its application to stochastic kinetic models
This paper addresses the problem of Monte Carlo approximation of posterior
probability distributions. In particular, we have considered a recently
proposed technique known as population Monte Carlo (PMC), which is based on an
iterative importance sampling approach. An important drawback of this
methodology is the degeneracy of the importance weights when the dimension of
either the observations or the variables of interest is high. To alleviate this
difficulty, we propose a novel method that performs a nonlinear transformation
on the importance weights. This operation reduces the weight variation, hence
it avoids their degeneracy and increases the efficiency of the importance
sampling scheme, specially when drawing from a proposal functions which are
poorly adapted to the true posterior.
For the sake of illustration, we have applied the proposed algorithm to the
estimation of the parameters of a Gaussian mixture model. This is a very simple
problem that enables us to clearly show and discuss the main features of the
proposed technique. As a practical application, we have also considered the
popular (and challenging) problem of estimating the rate parameters of
stochastic kinetic models (SKM). SKMs are highly multivariate systems that
model molecular interactions in biological and chemical problems. We introduce
a particularization of the proposed algorithm to SKMs and present numerical
results.Comment: 35 pages, 8 figure
Utility of multispectral imaging for nuclear classification of routine clinical histopathology imagery
<p>Abstract</p> <p>Background</p> <p>We present an analysis of the utility of multispectral versus standard RGB imagery for routine H&E stained histopathology images, in particular for pixel-level classification of nuclei. Our multispectral imagery has 29 spectral bands, spaced 10 nm within the visual range of 420–700 nm. It has been hypothesized that the additional spectral bands contain further information useful for classification as compared to the 3 standard bands of RGB imagery. We present analyses of our data designed to test this hypothesis.</p> <p>Results</p> <p>For classification using all available image bands, we find the best performance (equal tradeoff between detection rate and false alarm rate) is obtained from either the multispectral or our "ccd" RGB imagery, with an overall increase in performance of 0.79% compared to the next best performing image type. For classification using single image bands, the single best multispectral band (in the red portion of the spectrum) gave a performance increase of 0.57%, compared to performance of the single best RGB band (red). Additionally, red bands had the highest coefficients/preference in our classifiers. Principal components analysis of the multispectral imagery indicates only two significant image bands, which is not surprising given the presence of two stains.</p> <p>Conclusion</p> <p>Our results indicate that multispectral imagery for routine H&E stained histopathology provides minimal additional spectral information for a pixel-level nuclear classification task than would standard RGB imagery.</p
- …